Testing APSyn against Vector Cosine on Similarity Estimation
نویسندگان
چکیده
In Distributional Semantic Models (DSMs), Vector Cosine is widely used to estimate similarity between word vectors, although this measure was noticed to suffer from several shortcomings. The recent literature has proposed other methods which attempt to mitigate such biases. In this paper, we intend to investigate APSyn, a measure that computes the extent of the intersection between the most associated contexts of two target words, weighting it by context relevance. We evaluated this metric in a similarity estimation task on several popular test sets, and our results show that APSyn is in fact highly competitive, even with respect to the results reported in the literature for word embeddings. On top of it, APSyn addresses some of the weaknesses of Vector Cosine, performing well also on genuine similarity estimation.
منابع مشابه
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
In this paper, we claim that Vector Cosine – which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the s...
متن کاملUnsupervised Measure of Word Similarity: How to Outperform Co-Occurrence and Vector Cosine in VSMs
In this paper, we claim that vector cosine – which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of th...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملCosine Similarity Measure of Interval Valued Neutrosophic Sets
In this paper, we define a new cosine similarity between two interval valued neutrosophic sets based on Bhattacharya’s distance [19]. The notions of interval valued neutrosophic sets (IVNS, for short) will be used as vector representations in 3D-vector space. Based on the comparative analysis of the existing similarity measures for IVNS, we find that our proposed similarity measure is better an...
متن کاملImproved cosine similarity measures of simplified intuitionistic sets for medicine diagnoses
Similarity measures are an important tool in pattern recognition and medical diagnosis. To overcome some disadvantages of existing cosine similarity measures for simplified neutrosophic sets (SNSs) in vector space, this paper proposes improved cosine similarity measures for SNSs based on the cosine function, including single valued neutrosophic cosine similarity measures and interval neutrosoph...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1608.07738 شماره
صفحات -
تاریخ انتشار 2016